Open Problem: Lower bounds for Boosting with Hadamard Matrices

نویسندگان

  • Jiazhong Nie
  • Manfred K. Warmuth
  • S. V. N. Vishwanathan
  • Xinhua Zhang
چکیده

Boosting algorithms can be viewed as a zero-sum game. At each iteration a new column / hypothesis is chosen from a game matrix representing the entire hypotheses class. There are algorithms for which the gap between the value of the sub-matrix (the t columns chosen so far) and the value of the entire game matrix is O( √ logn t ). A matching lower bound has been shown for random game matrices for t up to n where α ∈ (0, 1 2 ). We conjecture that with Hadamard matrices we can build a certain game matrix for which the game value grows at the slowest possible rate for t up to a fraction of n. 1. Boosting as a zero-sum game Boosting algorithms follow the following protocol in each iteration (e.g. Freund and Schapire, 1997; Freund, 1995): The algorithm provides a distribution d on a given set of n examples. Then an oracle provides “weak hypothesis” from some hypotheses class and the distribution is updated. At the end, the algorithm outputs a convex combination w of the hypotheses it received from the oracle. One can view Boosting as a zero-sum game between a row and a column player (Freund and Schapire, 1997). Each possible hypothesis provided by the oracle is a column chosen from an underlying game matrix U that represents the entire hypotheses class available to the oracle. The examples correspond to the rows of this matrix. At the end of iteration t, the algorithm has received t columns/hypotheses so far, and we use Ut to denote this sub-matrix of U. The minimax value of Ut is defined as follows: val(Ut) = min d∈Sn max w∈St dUt w = max w∈St min r=1,...,n [Ut w]r. (1) Here d is the distribution on the rows/examples and w represents a convex combination of the t columns of Ut. Finally [Ut w]r is the margin of row/example r wrt the convex combination w of the current hypotheses set. So in Boosting the value of Ut is the maximum minimum margin of all examples achievable with the current t columns of Ut. The value of Ut increases as columns are added and in this view of Boosting, the goal is to raise the value of Ut as quickly as possible to the value of the entire underlying game matrix U. There are boosting algorithms that guarantee that after O( logn 2 ) iterations, the c © 2013 J. Nie, M.K. Warmuth, S. Vishwanathan & X. Zhang. Nie Warmuth Vishwanathan Zhang gap val(U)− val(Ut) is at most (Freund and Schapire, 1997; Rätsch and Warmuth, 2005; Warmuth et al., 2008). In other words, the gap at iteration t is at most O( √ logn t ). Here we are interested in finding game matrices with a matching lower bound for the value gap. The lower bound should hold for any boosting algorithm, and therefore the gap in this case is defined as the maximum over all submatrices Ut of t columns of U: 1 gapt(U) := val(U)−max Ut val(Ut). First notice that the gap is non-zero only when t ≤ n, since for any n ×m (m > n) game matrix, its value is always attained by one of its sub-matrices of size n × (n + 1). This follows from Carathodory theorem which implies that for any column player w ∈ Sm, there is ŵ with support of size at most n+ 1 satisfying Uw = Uŵ. So wlog m ≤ n. Klein and Young (1999) showed that for a limited range of t (log n ≤ t ≤ nα with α ∈ (0, 1 2)), the gap is Ω( √ logn t ) with high probability for random bit matrices U. 2 We claim that with certain game matrices the range of t in this lower bound can be increased. 2. Lower bounds with Hadamard matrices Hadamard matrices have been used before for proving hardness results in Machine Learning (eg Kivinen et al., 1997; Warmuth and Vishwanathan, 2005) and for iteratively constructing game matrices with large gaps (Nemirovski and Yudin, 1983; Ben-Tal et al., 2001). We begin by giving a simple but weak lower bound using these matrices (an adaptation of Proposition 4.2 of Ben-Tal et al. (2001)). Let n = 2k and H be the n × n Hadamard matrix. Define Ĥ to be H with first row removed. We use game matrix U = [ Ĥ −Ĥ ] and let valD(U) denote val ([ U −U ]) . Notice that by definition 1, valD(U) = −minw∈Sn ‖Uw‖∞ ≤ 0. Theorem For 1 ≤ t ≤ n2 , valD(Ĥ) −maxĤt valD(Ĥt) ≥ √ 1 2t , where the maximum is over all sub-matrices Ĥt of t columns of Ĥ. Proof First we show valD(Ĥ) = 0. Notice that Ĥ has row sum zero and valD(Ĥ) = − min w∈Sn ‖Ĥw‖∞ ≥ −‖Ĥ 1 n ‖∞ = 0. Since H has orthogonal columns, we have that for any Ĥt, Ĥ > t Ĥt = n It − 1t1t and min w∈St ‖Ĥtw‖∞ ≥ min w∈St ‖Ĥtw‖2 √ n− 1 = min w∈St √ w>Ĥt Ĥtw n− 1 = min w∈St √ n n− 1 w>w − 1 n− 1 ≥ √ (n− t)/(n− 1)t. 1. Freund (1995) originally gave an adversarial oracle that iteratively produces a hypothesis of error w.r.t. the current distribution, and for any particular algorithm, the oracle can make this go on for Ω( logn 2 ) iterations. A lower bound of Ω( √ (logn)/t) on the value gap is a much stronger type of lower bound. 2. The same lower bound translates to random ±1 matrices via shifting and scaling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

General Lower Bounds on Maximal Determinants of Binary Matrices

We prove general lower bounds on the maximal determinant of n× n {+1,−1}matrices, both with and without the assumption of the Hadamard conjecture. Our bounds improve on earlier results of de Launey and Levin (2010) and, for certain congruence classes of n mod 4, the results of Koukouvinos, Mitrouli and Seberry (2000). In an Appendix we give a new proof, using Jacobi’s determinant identity, of a...

متن کامل

Switching Operations for Hadamard Matrices

We define several operations that switch substructures of Hadamard matrices thereby producing new, generally inequivalent, Hadamard matrices. These operations have application to the enumeration and classification of Hadamard matrices. To illustrate their power, we use them to greatly improve the lower bounds on the number of equivalence classes of Hadamard matrices in orders 32 and 36 to 3,578...

متن کامل

Meeting the Welch and Karystinos-Pados Bounds on DS-CDMA Binary Signature Sets

The Welch lower bound on the total-squared-correlation (TSC) of binary signature sets is loose for binary signature sets whose length L is not a multiple of 4. Recently Karystinos and Pados developed new bounds that are better than the Welch bound in those cases, and showed how to achieve the bounds with modiied Hadamard matrices except in a couple of cases. In this paper, we study the open cases.

متن کامل

Probabilistic lower bounds on maximal determinants of binary matrices

Let D(n) be the maximal determinant for n × n {±1}-matrices, and R(n) = D(n)/n be the ratio of D(n) to the Hadamard upper bound. Using the probabilistic method, we prove new lower bounds on D(n) and R(n) in terms of the distance d to the nearest (smaller) Hadamard matrix, defined by d = n − h, where h is the order of a Hadamard matrix and h is maximal subject to h ≤ n. The lower bounds on R(n) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013